Image compression using a color visual model

ABSTRACT

A system for coding images, and more particularly, to a system for compressing images to a reduced number of bits by employing a Discrete Cosine Transform (DCT) in combination with a visual model.

BACKGROUND OF THE INVENTION

This application claims the benefit of 60/441,583 filed Jan. 21, 2003entitled Automatic Image Compression Using A Color Visual Model.

The present invention relates to a system for coding images, and moreparticularly, to a system for compressing images to a reduced number ofbits by employing a Discrete Cosine Transform (DCT) in combination witha visual model.

There has been significant development in the compression of digitalinformation for digital images. The effective compression of digitalinformation is important to maintain sufficient quality of the digitalimage while at the same time reducing the amount of data required forrepresenting the digital image. The transmission of the digital imageshas gained particular importance in television systems and Internetbased transmission. If the digital images include a relatively largenumber of bits to represent the digital images, a significant burden isplaced on the infrastructure of communication networks involved with thecreation, transmission, and re-creation of digital images. For thisreason, there is a need to compress digital images to a smaller numberof bits, by reducing redundancy and “invisible” image components of theimages themselves.

Still image compression techniques, such as JPEG, compress digitalinformation for digital images. As in digital compression for thetransmission of digital video, JPEG compression includes a tradeoffbetween file size and compressed image quality. For example, JPEGcompression is extensively used in digital cameras, Internet basedapplications, and databases containing digital images.

Many of the image compression techniques, such as JPEG and MPEG, includea transform coding algorithm for the digital image, wherein the image isdivided into blocks of pixels. For example, each block of pixels may bean 8×8 or 16×16 block of pixels. Each block of pixels then undergoes atwo dimensional transform to produce a two dimensional array oftransform coefficients. For many image coding applications, a DiscreteCosine Transform (DCT) is utilized to provide an orthogonal transform.After the block of pixels undergoes a Discrete Cosine Transform (DCT),the resulting transform coefficients are subject to compression bythresholding and quantization operations. Thresholding involves settingall coefficients whose magnitude is smaller than a threshold value equalto zero, whereas quantization involves scaling a coefficient by stepsize and rounding off to the nearest integer.

Commonly, the quantization of each DCT coefficient is determined by anentry in a quantization matrix (Q-table). A quantization matrix includesa plurality of values that is used to group a set of values together.For example, a quantization matrix may be used to group the values from0 to 3 into group 1, values from 3-6 into group 2, and values from 6-9into group 3. It is this matrix that is primarily responsible for theperceived image quality and the bit rate of the transmission of theimage. The perceived image quality is important because the human visualsystem can only tolerate a certain amount of image degradation withoutsignificantly observing a noticeable error. Therefore, certain imagescan tolerate significant degregration and thus be significantlycompressed, whereas other images cannot tolerate significant degradationand should not be significantly compressed.

Some systems include computing a single DCT quantization matrix based onhuman sensitivity. One such system is based on a mathematical formulafor the human contrast sensitivity function, scaled for viewing distanceand display resolution, as taught in U.S. Pat. No. 4,780,716. Anothersuch system is based on a formula for the visibility of individual DCTbasic functions, as a function of viewing distance, display resolution,and display luminance. The formula is disclosed in both a first articleentitled “Luminance-Model-Based DCT Quantization For Color ImageCompression” of A. J. Ahumada et al. published in 1992 in the HumanVision, Visual Processing, and Digital Display III Proc. SPIE 1666,Paper 32, and a second technical article entitled “An Improved DetectionModel for DCT Coefficient Quantization” of H. A. Peterson, et al.,published in 1993, in Human Vision, Visual Processing and DigitalDisplay VI Proc. SPIE. Vol. 1913 pages 191-201. The techniques describedin the '761 patent and the two technical articles do not adapt thequantization matrix to the image being compressed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer network that may be used in thepractice of the present invention.

FIG. 2 schematically illustrates a block diagram of an image encodingsystem.

FIG. 3 schematically illustrates the comparison of a pair of images.

DETAILED DESCRIPTION OF THE INVENTION

Referring to FIG. 1 a block diagram of a computer network 10 for thestoring, retrieving, and transmitting of images is illustrated. A pairof image processing devices 12 and 14 are provided. The image processingdevice 12 may be used to perform a storage mode 16 and a retrieval mode18 operation of the network 10 and, similarly, the image processingdevice 14 may be used to perform a storage mode 16 and a retrieval mode18 operation of the network 10. The storage mode 16 accesses a disksubsystem 20, whereas the retrieval mode 18 recovers information fromthe disk subsystem 20. Each of the devices 12 and 14 may be any type ofprocessing device, or otherwise a single processing device including thefunctionality of both devices 12 and 14. The devices 12 and 14 mayfurther include a RAM 26, a communication channel 22, a CPU processor24, and a display subsystem 28.

In general the system may include, in part, a compression technique thatincorporates a Discrete Cosine Transform (DCT). In the storage mode 16,an image 30 including a plurality of pixels, represented by a pluralityof digital bits, is received from any suitable sources through thecommunication channel 22 of the device 12. The device, and in particularthe CPU processor 24, performs a DCT transformation, computes a DCTmask, if desired, selects a quantization matrix, and estimates aquantization matrix optimizer. The device 12 then quantizes the digitalbits comprising the image 30, and performs encoding of the resultingquantized DCT coefficients, such as by example by run-length encoding,Huffman coding, or arithmetic coding. The resulting quantization matrixis then stored in coded form along with coded coefficient data using anysuitable technique, such as the JPEG standard. The compressed file isthen stored on the disk subsystem 20 of the device 12, or otherwisetransmitted to another device.

In the retrieval mode 18, the device 12 (or 14) retrieves the compressedfile from the disk subsystem 20, and decodes the quantization matrix andthe DCT coefficient data. The device 12 (or 14) then de-quantizes thecoefficients by multiplication of the resulting scaled quantizationmatrix and performs an inverse DCT. The resulting digital filecontaining pixel data is available for display on the display subsystem28 of the device 12 (or 14) or can be transmitted to the device 14 (or12) or elsewhere by the communication channel 22. The resulting digitalfile is illustrated in FIG. 1 as 30′ (IMAGE).

In some applications, such as digital image database applications, theimage may be compressed using a Q-table and then the resultingcompressed image is reconstructed and presented to the user. The userthen makes adjustments to the Q-table in some fashion and the process isrepeated until an acceptable compression of the image is achieved. Whilethis achieves an acceptable result, the process is time consuming,especially for large digital image databases. While it is the case thatthe appropriate selection of a Q-table (set of values) is desirable, itis problematic to automatically select such a table.

One existing technique for the selection of the Q-table is illustratedin U.S. Pat. No. 5,426,512, incorporated by reference herein. The errorresulting from quantization for a given scale factor of the Q-table isscaled in the DCT domain by using a perceptual mask, that suppressessome errors and leaves some other errors. The result after applying themask is then spatially pooled and compared against a target error. Ifsufficiently close to a target error, then the current Q-table is usedto compress the image.

If not sufficiently close, the Q-table is adjusted. The model used isbased upon a mean block luminance (for light adaptation) and a DCTcoefficient that depends on thresholds based on coefficient amplitudes(for masking).

After consideration of using a visual model within the compressionprocess for Q-table optimization and comparison of DCT coefficients ofcompressed and uncompressed images, as disclosed in the '512 patent, thepresent inventors determined that the resulting model does notaccurately reflect the user's perception of the images. Moreover, usingthe visual model within the compression process for Q-table optimizationand comparison of DCT coefficients of compressed and uncompressedimages, as disclosed in the '512 patent, the present inventors furtherdetermined that the model does not take into account the displayparameters of the output device, such as the color primaries, themodulation transfer function, resolution (e.g., dpi), and tone scale.

To overcome this limitation the present inventors determined that amodel, such as a visual model of the human visual system, should be usedas the basis of comparison between uncompressed and compressed images inthe spatial domain.

Referring to FIG. 2, the system may include an input image 50 which isto be compressed using different Q-tables (or the same Q-tablemodified). The discrete cosine transform coefficients 52 are calculatedfrom the input image 50 (which may be in original form or modified byother techniques). Thresholding of the DCT coefficients may beperformed, if desired. A set of quantization tables (Q-table) 54, 56,58, and 60 are used to quantize the discrete cosine transformcoefficients. Larger values in the Q-table typically result in a smallercompressed file size, with larger compression artifacts. Similarly,smaller values in the Q-table typically result in a larger compressedfile size, with smaller compression artifacts. The present inventorscame to the realization that an “optimal” Q-table is not only dependenton the viewing condition, but is also dependent on the image itself. Inthe preferred embodiment, a set of four Q-tables may be used based uponthe human visual contrast sensitivity function (CSF) using differentviewing distances (such as 11, 14, 17, and 19 inches). The resolution ofthe intended display, the modulation transfer function of the display,the display luminance characteristics of the display, the display colorgamut of the display, the tone response curve of the display, may betaken into consideration when creating the Q-tables. For example, closerviewing distances will result in a flatter Q-table in the frequencydomain, while farther viewing distances will yield a steeper Q-table inwhich the higher order DCT coefficients are quantized more aggressively(with respect to the flatter Q-table).

The resulting set of Q-tables include characteristics that account forone or more of the following properties, such as for example, thecontrast sensitivity function of the human visual system, the viewingdistances, resolution of the intended display, the display luminancecharacteristics of the display, the display color gamut of the display,the tone response curve of the display, and the modulation transferfunction of the display. In this manner, the Q-table is different thanit would have been had one or more of these factors been omitted oradded.

The DCT coefficients, and hence the resulting image after encoding, arecompressed to substantially the same compression ratio. The compressionratio, may be for example, each (or a plurality of) resulting image iswithin 25% of the same size, within 10% of the same size, or within 5%of the same size. To achieve sufficient similarity in compression ratiothe Q-table may be scaled and the image recompressed. Accordingly, theeffect of each Q-table for compressing a particular image may be moreeffectively compared against the effect of other Q-tables if theresulting compressed image has a sufficiently similar compression ratio.

A model 62, 64, 66, and 68, such as a color visual difference model, maybe used to compare the differences between the original image 50 (orotherwise an image that has not been compressed) and an uncompressedversion of the respective image after quantization using the respectiveQ-table 54, 56, 58, 60. A color visual difference model simulates thevisual perception of the human eye. Once such model is X. Feng, J.Speigel, and A. Morimoto, “Halftone image quality evaluation using colorvisual models”, Proc. Of PICS 2002, p 5-10, 2002, incorporated byreference herein. Such a model collapses to CIELAB for large patches ofcolor. The model may be calibrated so that the threshold occurs at deltaE 1.0, regardless of the frequency and background.

The model, based upon the viewing condition and display characteristics,may calculate the visibility of the differences as a function oflocation in the image. The result may be a set of values, or for JPEG asingle number, from the visual difference map. A variety of differentmetrics may be used, such as root mean square, median, 90^(th)percentile, and 99^(th) percentile. In the preferred embodiment, the99^(th) percentile is used and the threshold may be set to 1 delta Eunit, which is approximately the visual detection threshold. Thethreshold may be adjusted higher for applications where quality is notcritical and storage is at a premium. The threshold may also be adjustedlower for applications that quality is critical, or the JPEG images maybe viewed at a close distance.

Once the Q-table has been selected at block 70 based upon some criteria,the image 50 is compressed using a DCT, the selected Q-table, andencoding of the data, at block 72. The resulting image is thenreconstructed and compared against the image 50 using a model, such asthe color visual difference model at block 74. If the resulting errormetric E at block 76 is smaller than a low threshold (such as athreshold minus a tolerance value which may be within approximately 5%of the tolerance, if desired) then a scaling factor that scales thevalues in the Q-table is checked at block 78 to see if it is greaterthan a maximum value. The scaling factor scales the Q-table in somemanner and thus controls the amount of compression, which impacts theresulting image quality. If the scaling factor is not greater than amaximum value then the scaling factor is increased at block 80. Thus,block 80 results from the case when the compression artifacts are belowthe visual threshold based upon some viewing condition and/or display.Therefore, the image may be compressed further to reduce the compressedimage size by increasing the scale factor. The selected Q-table is thenre-scaled using the modified scaling factor and the image 50 is thenre-quantized using the modified Q-table. The quantized image is thenreconstructed and evaluated against the image 50 using a model, such asthe color visual difference model at block 74. The error metric iscomputed at block 76 and if the error is greater than a high threshold(such as a threshold plus a tolerance value) then the scaling factorthat scales the value in the Q-table is checked at block 82 to see if itis smaller than a minimum value. If the scaling factor is not less thanthe minimum value then the scaling factor is decreased at block 84.Thus, block 80 results from the case when the compression artifacts areabove the visual threshold based upon some viewing condition and/ordisplay. Therefore, the image may be compressed less to increase thecompressed image size by decreasing the scale factor. The selectedQ-table is then re-scaled using the modified scaling factor and theimage 50 is then re-quantized using the modified Q-table. The quantizedimage is then reconstructed and evaluated against the image 50 using thecolor visual difference model at block 74. The error metric is computedat block 76 and if the error is within tolerances a suitable Q-table andscaling factor (or otherwise modified Q-table) is selected. The imagemay be saved in a suitable file format, such as JPEG or otherwisetransmitted to a suitable destination at block 86.

In another embodiment, the Q-tables may be based upon other criteria.For example, the Q-tables may represent different power spectra in theimage to be compressed. This aspects relates to masking, which in turnrelates to supra-threshold perception (i.e., in supra-thresholdperception, the contrast is higher and more masking typically occurs).As the level of overall masking occurring in an image rises, thevariation in sensitivities of the spatial frequency channels decreases.This implies a flatter Q-table will be appropriate for that image. Inthe case of image with very special characteristics, such as anapplication that has many images of striated texture (microscopicmedical images), then the tables may reflect the oriented textures aswell, and additional tables may be desirable.

Referring to FIG. 3, a graphical illustration is provided on oneembodiment of a portion of the system. All illustrated an original image100 is encoded 102, such as by a JPEG encoder. The encoded image is thenreconstructed 104. The original image 100 and the reconstructed image104 are modeled, such as by a color visual difference model 106. Themodel 106 provides a visual difference map of the image 108 from whichan error metric 110 may be obtained.

1. An image encoding system comprising: (a) providing a first image; (b)quantizing a discrete cosine transform of said first image using a firstset of quantization values; (c) quantizing said discrete cosinetransform of said first image using a second set of quantization values;(d) comparing said first image to a spatial reconstructed image basedupon said first set of quantization values using a model; (e) comparingsaid first image to a spatial reconstructed image based upon said secondset of quantization values using said model; (f) selecting one of saidfirst set of quantization values and said second set of quantizationvalues based upon respective said comparing.
 2. The method of claim 1wherein said discrete cosine transform results in a matrix of values. 3.The method of claim 1 wherein said first set of quantization values isbased upon, at least in part, the color primaries of a display.
 4. Themethod of claim 1 wherein said first set of quantization values is basedupon, at least in part, the modulation transfer function of a display.5. The method of claim 1 wherein said first set of quantization valuesis based upon, at least in part, a tone scale of a display.
 6. Themethod of claim 1 wherein said first set of quantization values is basedupon, at least in part, the resolution of a display.
 7. The method ofclaim 1 wherein said first set of quantization values is based upon, atleast in part, a particular viewing distance for viewing the display. 8.The method of claim 1 wherein said comparing is based upon, at least inpart, a contrast sensitivity function of the human visual system.
 9. Themethod of claim 1 wherein said first set of quantization values is basedupon, at least in part, a color gamut of a display.
 10. The method ofclaim 1 wherein said comparing is based upon, at least in part, acontrast sensitivity difference model.
 11. The method of claim 10wherein said model collapses to CIELAB for large patches of color. 12.The method of claim 1 wherein said first set of quantization values isbased upon, at least in part, viewing conditions and image-structuredependent.
 13. The method of claim 1 wherein said first set ofquantization values is based upon, at least in part, a luminanceresponse of a display.
 14. The method of claim 1 wherein said selectingis based upon an error measure.
 15. The method of claim 1 furthercomprising determining a first error measure based upon said comparingof said first set and a second error measure based upon said comparingof said second set.
 16. The method of claim 15 wherein said selecting isbased upon said first and second error measures.
 17. The method of claim16 further comprising modifying said selected set of quantization valuesbased upon said error measure.
 18. The method of claim 17 furthercomprising modifying said image based upon said modified selected set ofquantization values.
 19. The method of claim 18 wherein said modifiedimage is encoded.
 20. An image encoding system comprising: (a) providinga first image; (b) quantizing a discrete cosine transform of said firstimage using a first set of quantization values; (c) comparing said firstimage to a spatial reconstructed image based upon said first set ofquantization values using a model to determine an error measure; (d)based upon said error measure modifying said first set of quantizationvalues; and (e) quantizing said discrete cosine transform of said firstimage using said modified first set of quantization values.
 21. Themethod of claim 20 wherein a scaling factor is selectively increasedbased upon said error measure.
 22. The method of claim 21 wherein saidscaling factor is selectively decreased based upon said error measure.23. The method of claim 21 wherein said error measure is selectivelyincreased provided said error measure is less than a threshold.
 24. Themethod of claim 22 wherein said error measure is selectively decreasedprovided said error measure is greater than a threshold.
 25. An imageencoding system comprising: (a) providing a first image; (b) quantizinga discrete cosine transform of said first image using a first set ofquantization values; (c) quantizing said discrete cosine transform ofsaid first image using a second set of quantization values; (d)comparing said first image to a spatial reconstructed image based uponsaid first set of quantization values using a model to determine anerror measure; (e) comparing said first image to a spatial reconstructedimage based upon said second set of quantization values using said modelto determine an error measure; (f) selecting one of said first set ofquantization values and said second set of quantization values basedupon respective said error measures; (g) based upon said error measuremodifying a respective set of quantization values; (h) quantizing saiddiscrete cosine transform of said first image using said modified set ofquantization values.